SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data

نویسندگان

  • Hagit Shatkay
  • Annette Höglund
  • Scott Brady
  • Torsten Blum
  • Pierre Dönnes
  • Oliver Kohlbacher
چکیده

MOTIVATION Knowing the localization of a protein within the cell helps elucidate its role in biological processes, its function and its potential as a drug target. Thus, subcellular localization prediction is an active research area. Numerous localization prediction systems are described in the literature; some focus on specific localizations or organisms, while others attempt to cover a wide range of localizations. RESULTS We introduce SherLoc, a new comprehensive system for predicting the localization of eukaryotic proteins. It integrates several types of sequence and text-based features. While applying the widely used support vector machines (SVMs), SherLoc's main novelty lies in the way in which it selects its text sources and features, and integrates those with sequence-based features. We test SherLoc on previously used datasets, as well as on a new set devised specifically to test its predictive power, and show that SherLoc consistently improves on previous reported results. We also report the results of applying SherLoc to a large set of yet-unlocalized proteins. AVAILABILITY SherLoc, along with Supplementary Information, is available at: http://www-bs.informatik.uni-tuebingen.de/Services/SherLoc/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Significantly Improved Prediction of Subcellular Localization by Integrating Text and Protein Sequence Data

Computational prediction of protein subcellular localization is a challenging problem. Several approaches have been presented during the past few years; some attempt to cover a wide variety of localizations, while others focus on a small number of localizations and on specific organisms. We present a comprehensive system, integrating protein sequence-derived data and text-based information. Iti...

متن کامل

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

Comparative in silico analyses of proteins involved in serum resistance as promising vaccine candidates against Acinetobacter baumannii

Introduction: Acinetobacter baumannii as a Gram-negative coccobacillus has become a major cause of hospital-acquired infections. The virulence factors involved in serum resistance are important targets in the development of an effective vaccine against this pathogen. Our aim in this project was in silico analyses of A. baumannii proteins involved in serum resistance which could potentially be u...

متن کامل

Protein Subcellular Localization Prediction for Fusarium graminearum∗

The fungal pathogen Fusarium graminearum (telomorph Gibberella zeae) is the causal agent of several destructive crop diseases. Investigating subcellular localizations of F. graminearum proteins can provide insight into pathogenic mechanisms underlying F. graminearum-host interactions. In this paper, we design a novel balanced ensemble classifier based on support vector machines (SVMs) to predic...

متن کامل

SherLoc2: a high-accuracy hybrid method for predicting subcellular localization of proteins.

SherLoc2 is a comprehensive high-accuracy subcellular localization prediction system. It is applicable to animal, fungal, and plant proteins and covers all main eukaryotic subcellular locations. SherLoc2 integrates several sequence-based features as well as text-based features. In addition, we incorporate phylogenetic profiles and Gene Ontology (GO) terms derived from the protein sequence to co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 23 11  شماره 

صفحات  -

تاریخ انتشار 2007